Education interacts with genetic variants near GJD2, RBFOX1, LAMA2, KCNQ5 and LRRC4C to confer susceptibility to myopia

Myopia most often develops during school age, with the highest incidence in countries with intensive education systems. Interactions between genetic variants and educational exposure are hypothesized to confer susceptibility to myopia, but few such interactions have been identified. Here, we aimed to identify genetic variants that interact with education level to confer susceptibility to myopia. Two groups of unrelated participants of European ancestry from UK Biobank were studied. A ‘Stage-I’ sample of 88,334 participants whose refractive error (avMSE) was measured by autorefraction and a ‘Stage-II’ sample of 252,838 participants who self-reported their age-of-onset of spectacle wear (AOSW) but who did not undergo autorefraction. Genetic variants were prioritized via a 2-step screening process in the Stage-I sample: Step 1 was a genome-wide association study for avMSE; Step 2 was a variance heterogeneity analysis for avMSE. Genotype-by-education interaction tests were performed in the Stage-II sample, with University education coded as a binary exposure. On average, participants were 58 years-old and left full-time education when they were 18 years-old; 35% reported University level education. The 2-step screening strategy in the Stage-I sample prioritized 25 genetic variants (GWAS P < 1e-04; variance heterogeneity P < 5e-05). In the Stage-II sample, 19 of the 25 (76%) genetic variants demonstrated evidence of variance heterogeneity, suggesting the majority were true positives. Five genetic variants located near GJD2, RBFOX1, LAMA2, KCNQ5 and LRRC4C had evidence of a genotype-by-education interaction in the Stage-II sample (P < 0.002) and consistent evidence of a genotype-by-education interaction in the Stage-I sample. For all 5 variants, University-level education was associated with an increased effect of the risk allele. In this cohort, additional years of education were associated with an enhanced effect of genetic variants that have roles including axon guidance and the development of neuronal synapses and neural circuits.

Introduction Myopia (short-sightedness) is a refractive error caused by a mismatch between the focal and axial lengths of the eye. Myopia currently affects 22% of the world population and its prevalence is increasing, especially in recent birth cohorts [1,2]. Individuals with myopia are at greater risk of ocular pathologies such as glaucoma, myopic maculopathy and retinal detachment [2,3]. Refractive errors have a significant economic, societal and healthcare impact due to the requirement of sight tests, corrective aids or surgery, and the associated increased risk of blindness and sight impairment [4].
Refractive errors are highly heritable. To date, genome-wide association studies (GWAS) have identified more than 450 genetic variants associated with refractive error [5,6]. Pathway analysis of these gene variants has highlighted a diverse range of functions, signaling pathways and cellular processes involving many different retinal cell types, as well as the structure and function of the extracellular matrix. Environmental factors such as years spent in education, excessive near work, and less time outdoors are also associated with myopia [7][8][9][10][11][12]. Typically, genetic and lifestyle risk factors for refractive error have been studied separately, with few studies directly assessing the contribution from gene-environment (GxE) interaction effects [13][14][15].
A standard GWAS tests for an association between the genotype of single nucleotide polymorphisms (SNPs) and the mean level of a phenotype. By contrast, a 'variance heterogeneity' analysis tests for SNPs that exhibit a difference in phenotype variance across genotypes (Fig 1). Most SNPs with GxE interaction effects are expected to be variance heterogeneity quantitative trait loci ('vQTLs') [16]. Notably, information about each participant's level of exposure to the relevant environmental risk factor is not required to perform a variance heterogeneity analysis; only genotype and phenotype information is required. A variance heterogeneity analysis can be used as a screening step to prioritize SNPs for further assessment for GxE interaction effects, which reduces the multiple testing burden and loss of statistical power associated with testing vast numbers of variants [16][17][18][19]. Here, we adopted a 2-step genome-wide screening strategy, comprising of a standard GWAS analysis followed by a variance heterogeneity analysis, to enrich for SNPs with an above-average likelihood of involvement in a GxE interaction associated with refractive error. We then assessed the prioritized SNPs for genotype-byeducation interaction effects.

Fig 1. Two-step screening strategy for selecting SNPs likely to be involved in a GxE interaction.
The graphs provide schematic illustrations of SNPs without (left) and with (middle and right) GxE interactions, using education as an exemplar environmental exposure. A: Standard GWAS analysis will detect effects averaged across environments for the population (black dashed line). Thus, a standard GWAS analysis will detect SNPs that do not have GxE interaction effects (left panel) but it will also detect SNPs that do have GxE interaction effects (middle panel) unless the SNP effects cancel out in different environments (right panel). B: SNPs involved in GxE interactions are expected to exhibit variance heterogeneity for each genotype, which is depicted as differences in height of the green arrows (middle and right panels) whereas variance heterogeneity is not expected for a SNP with no GxE interaction (left panel). https://doi.org/10.1371/journal.pgen.1010478.g001

PLOS GENETICS
Education interacts with genetic variants to confer susceptibility to myopia

Results
Analyses were performed in 2 independent samples of UK Biobank participants (Table 1): a 'Stage-I' sample (N = 88,334) with known refractive error (avMSE) and a 'Stage-II' sample (N = 252,838) with known age-of-onset of spectacle wear (AOSW). On average, participants were 58 years-old (range 40-73 years) and left full-time education at age 18 years (range 13-26 years). Approximately one third of participants had undergone University-level education (coded as a binary variable, UniEdu). The sample size was approximately 3-fold larger for the Stage-II sample than the Stage-I sample due to autorefractor measurements only being performed in the later stage of UK Biobank recruitment.

Overview of the two-step strategy for selecting variants involved in GxE interactions
A 2-step screening strategy in the Stage-I sample, with avMSE as the phenotype-of-interest, was used to select a set of genetic variants with above-average likelihood of involvement in a GxE interaction (Fig 1).
Step 1 was a standard GWAS for refractive error, testing for a SNPphenotype 'marginal effect'. SNPs with GxE interaction effects are expected to show a marginal effect association unless the direction of effect of the SNP reverses at different levels of the environmental exposure ( Fig 1A). After 'clumping' SNPs in high linkage disequilibrium (LD) to identify independently associated variants, the GWAS identified 956 SNPs with suggestive evidence of association with avMSE at the liberal threshold of P < 1 x 10 −4 (S1 Table and Fig A in S1 Text).
Step 2 was a test for variance heterogeneity, which is also an expected feature of many of the SNPs involved in a GxE interaction (Fig 1B). Of the 956 independently associated SNPs identified from Step 1, there were 25 variants (3%) with evidence of variance heterogeneity for avMSE (variance heterogeneity P < 5 x 10 −5 ; Bonferroni correction for 956 tests). Simulations confirmed that the variance heterogeneity test (Levene's median test) maintained the correct type-I error rate under the test conditions, despite the non-normal distributions of avMSE (Box A in S1 Text). Details of the 25 variants identified by the 2-step screening strategy are presented in Table 2.

Confirmation of variance heterogeneity in the Stage-II sample
The 25 variants identified using the 2-step screening strategy in the Stage-I sample were examined for evidence of variance heterogeneity in the Stage-II sample for the outcome AOSW. Seventy six percent of the variants (19 out or 25) displayed evidence of variance heterogeneity in the Stage-II sample (Levene's test, P < 0.05; Table 2), which was a higher proportion than expected by chance (binomial test, P = 2.52 x 10 −20 ) and suggested that the majority of the 25 vQTL were true positive findings. Simulations suggested Levene's test maintained the correct type-1 error rate for the AOSW phenotype, despite its highly non-normal distribution (Box A in S1 Text).

GxE interaction between genetic variants and educational attainment
Given prior work linking education and myopia, we examined if any of the 25 vQTL identified in the 2-step screening strategy displayed genotype-by-education interaction effects. Obtaining a University degree was taken as an index of relatively high educational intensity during childhood: the validity of this index of educational exposure is discussed in Box B of S1 Text. The validity of AOSW as a surrogate phenotype for refractive error when testing for gene-by-education interaction effects is discussed in Box C in S1 Text. The GxE interaction test results are presented in Figs 2 and 3, Tables 3 and S2. Six variants, nearby the genes TOX, GJD2, LAMA2, RBFOX1, KCNQ5 and LRRC4C, had evidence of a genotype × UniEdu interaction in the Stage-II sample after accounting for multiple testing (P < 0.002; Bonferroni correction for 25 tests). Similar results were obtained when considering the interaction between genotype × EduYears (the age of completing full-time education) (P < 0.002 for all 6 variants; S2 Table). For all 6 variants, additional years spent in education was associated with an increased impact of the SNP risk allele, consistent with education compounding genetic predisposition to myopia ( Fig  2 and S2 Table). The effect size of the genotype × UniEdu interaction was highly correlated between the Stage-I and Stage-II samples for the 25 variants (Spearman ρ = 0.71, P = 9.5 x 10 −5 ). Full results of tests for genotype × UniEdu and genotype × EduYears interactions for the phenotypes avMSE, AOSW and Myopic in the Stage-I and Stage-II samples are presented in Figs 3 and B in S1 Text and S2 Table. For 5 of the 6 lead GxE variants (all those except rs36005291 nearby the TOX gene) there was independent evidence for a SNP × UniEdu and/or SNP × EduYears interaction in the Stage-I cohort (Table 3 and Fig 3).
Statistical evidence for the presence of a GxE interaction can be heavily dependent on the choice of measurement scale of the phenotype [20,21]. Therefore, as a sensitivity analysis, we examined if the above interactions were specific to the chosen measurement scale (an additive-  Note that all GxE tests in the Stage-II sample will be correlated, all GxE tests in the Stage-I sample will be correlated, but that the tests in the Stage-II sample are independent of those in the Stage-I sample. https://doi.org/10.1371/journal.pgen.1010478.g003 scale) by performing logistic regression analyses (on the log-odds scale) in the Stage-II sample for the 'myopia case-control status' phenotype. Of the 6 variants examined, 5 retained evidence of a genotype × UniEdu interaction (P < 0.05 for all variants except KCNQ5 variant rs7744813) and all 6 retained evidence of a genotype × EduYears interaction (P < 0.05) when tested for an association with myopia case-control status (Figs 2 and 3 and S2 Table). Additional sensitivity analyses confirmed that the results were robust to whether statistical adjustment for UniEdu was or was not performed [19] in Step 1 and Step 2 (Box D in S1 Text).
For non-normally distributed traits such as refractive error, SNPs with marginal effects may also be associated with the trait variance (a 'mean-variance relationship'). Heteroskedastic linear mixed models (HLMMs) have been developed to detect variance heterogeneity after accounting for such a relationship [22][23][24]. Pozarickij et al. [25] reported a HLMM-based GWAS analysis for refractive error in the UK Biobank sample, which yielded 14 independent variants with genome-wide significant evidence of variance heterogeneity (P < 5 x 10 −8 ; see Table 3.4 of [25] for details). All 6 of the variants with significant genotype × education  The above tests for genotype-by-education interaction effects were performed under the assumption that SNPs had additive effects. This meant, for example, that the change in phenotype expected for individuals with a University education would be shifted by an amount β 3 in those with 1 copy of the SNP effect allele and by 2β 3 in those with 2 copies of the effect allele (see Eq 1 and Eq 2 in Methods). We previously observed that the LAMA2 lead variant effect allele may act recessively rather than additively [26]. Since dominant or recessive genetic variants tested using an additive model can give rise to spurious evidence of a GxE interaction [24], we explored if the genotype-by-UniEdu interactions of the TOX, GJD2, LAMA2, RBFOX1, KCNQ5 and LRRC4C variants were robust to the choice of model specification. As shown in Table 4 and Fig 4, the most parsimonious model was an additive model for all of the variants, except for LAMA2, which was better fit by a recessive model. However, the evidence for a genotype-by-UniEdu interaction was still evident for the LAMA2 variant under a recessive model, suggesting the evidence for interaction effects did not arise as a result of genetic model misspecification.

Exclusion of gene-environment correlation
SNPs directly associated with both educational attainment and refractive error could potentially yield spurious GxE interaction test findings, via 'gene-environment correlation' (rGE) [21,27]. Dudbridge and Fletcher [27] describe examples of the conditions under which rGE can lead to significant GxE interaction effects despite the genetic marker used in the GxE test having no causal effect on the outcome trait, while Dick et al. [21] discuss approaches for testing GxE interactions in the presence of rGE. As shown in Table 5, the only variant with evidence of an rGE effect was LRRC4C variant, rs11602008. This SNP had a weak association with UniEdu (OR = 1.017; P = 0.034). Thus, the data did not support rGE effects as an explanation for the interaction effects observed in the current study.

Tests for gene-gene interaction
Variance heterogeneity is an expected feature for genetic variants involved in GxG interactions, as well as those involved in GxE interactions [16]. Therefore, each of the 300 possible pairs of SNPs from amongst the 25 SNPs with evidence of variance heterogeneity was tested for a genotype-by-genotype interaction with the avMSE phenotype in the Stage-I sample and Table 4. Tests for genotype-by-education interaction when allowing for dominance effects. Linear regression analyses were performed for the AOSW phenotype in the Stage-II sample for genotype-by-UniEdu interactions when the SNP effect allele was coded as additive (0, 1 2), fully dominant (0, 1, 1) or fully recessive (0, 0, 1). The effect allele is the allele associated with a more myopic refractive error in a linear regression test for an additive marginal effect of the variant.

Variant
Nearest gene with the AOSW phenotype in the Stage-II sample. Only one pair of SNPs had evidence of a GxG interaction after accounting for multiple testing (P < 0.05/300). This was the interaction between rs12193446 × rs7744813, nearby the LAMA2 and KCNQ5 genes, which was strongly associated with AOSW in the Stage-II sample (P = 6.31 x 10 −5 ; Fig C in  there was no evidence of an interaction associated with avMSE for this pair of SNPs in the Stage-I sample (P = 0.97).

Discussion
This work identified 25 genetic variants associated with a difference in refractive error variance across genotypes, after accounting for multiple testing. Of these 25 variants, 19 also exhibited evidence of variance heterogeneity in association with AOSW in an independent sample, confirming them as bona fide vQTL. Consistent with the expectation that refractive error vQTL would be enriched for variants with GxE interaction effects, 6 genetic variants had evidence of genotype-by-education interaction effects associated with AOSW in the Stage-II sample, after accounting for multiple testing. For 5 of the 6 variants, which were located nearby the genes GJD2, LAMA2, RBFOX1, KCNQ5 and LRRC4C, there was independent evidence of a genotype-by-education interaction associated with refractive error in the Stage-I sample. For all variants, greater exposure to education was associated with an increased effect size for the myopia-predisposing risk allele, consistent with evidence that education is a causal risk factor for myopia [10][11][12]28]. Variants in 2 of the 5 genes with robust evidence of a GxE interaction, GJD2 and RBFOX1, have previously been reported to be involved in gene-by-education interactions influencing myopia [14]. The 3 remaining genes identified here (LAMA2, KCNQ5 and LRRC4C) are novel GxE interaction discoveries. The 25 genetic variants associated with refractive error variance are also promising candidates for involvement GxG interactions. However, the evidence for GxG interactions in the current samples was limited, with a single pair of variants displaying evidence of an interaction in the larger Stage-II sample but not in the smaller Stage-I sample (Fig C in S1 Text). The role of GxE interactions in myopia has recently been reviewed [29]. The current work brings the number of independently replicated gene-by-education interactions for myopia to 3 (ZMAT4, GJD2 and RBFOX1). In a meta-analysis of five studies from Singapore, Fan et al. [14] reported that GJD2 variant rs524952 and RBFOX1 variant rs17648524 (both in perfect LD with the variants in GJD2 and RBFOX1 studied here) were associated with a greater risk of myopia in individuals from a high vs. low education stratum. The gene-by-education interaction effect size for these two variants was larger in the East Asian cohorts studied by Fan et al. [14] than the current UK-based sample (β G×E � -0.25 D in the Fan et al. study vs. β G×E � -0.07 D in the current study), consistent with prior findings that GxE effects on myopia are larger in East Asians [15]. Several other genes have also been associated with gene-by-education interactions affecting refractive error: TRPM1, MMP2, SHISA6, DNAH9, ZMAT4, SFRP1, AREG, GABRR1, PDE10A, APLP2, DLX1, BICC1 and A2BP1 [5,14,15,[30][31][32][33]. Of these genes, only ZMAT4 was located nearby the 25 variants selected by our 2-step screening protocol. In the current analysis, rs869422 in ZMAT4 displayed evidence of an interaction with UniEdu and EduYears in the Stage-I sample (Fig 3; UniEdu: β G×E = -0.10 D, P = 1.65 x 10 −3 ; EduYears: β G×E = -0.17 D, P = 3.77 x 10 −3 ) and with EduYears in the Stage-II sample (β G×E = -0.07 years, P = 1.65 x 10 −3 ), however the genotype × UniEdu interaction test in the Stage-II sample was non-significant (P = 0.08). This illustrates the challenge of detecting GxE interaction effects in myopia development.
GJD2 is one of the most intensively studied myopia-susceptibility genes [34][35][36]. GJD2 encodes the neuronal gap junction protein connexin-36 (Cx36), which is thought to play a role in ON-bipolar cell signaling and cone-driven OFF pathways in the retina [35,37,38]. Loss-offunction mutations in connexin-36 in a Zebrafish model inhibit eye growth and diminish the electroretiniogram B-wave amplitude [35]. RBFOX1 encodes a member of the Fox-1 family, which regulate tissue-specific alternative splicing. As well as refractive error [39], RBFOX1 polymorphisms are associated with blood pressure [40], allergy [41] and lung cancer [42]. Presently, it is unclear how Fox-1 proteins confer susceptibility to myopia. KCNQ5 encodes the 'potassium voltage-gated channel subfamily Q member 5' protein, which forms M-type potassium channels in the inner and outer plexiform layers, rod and cone photoreceptors, and the RPE of the primate retina [43]. In a guinea pig myopia model, retinal Kcnq5 gene and protein expression were down-regulated [44]. In adult twins, KCNQ5 variant rs2840795 was associated with electroretiniogram B-wave responses [45]. However, rs2840795 is in weak LD (r 2 = 0.03) with the strongest myopia-predisposing variants in the region, rs7744813. LAMA2 encodes the alpha-2 laminin subunit. Laminin is a major component of basement membranes that has multiple roles, including the attachment of cells to the matrix [46]. LAMA2 variant rs12193446 is associated with refractive error early in life and then progressively through childhood [47], consistent with children's duration of exposure to education. LRRC4C encodes 'leucine rich repeat containing 4C', also known as netrin-G ligand-1 (NGL1). Netrins are axon guidance molecules related to laminin, highly expressed in the central nervous system during embryological development, which attract or repel axons to guide them to their correct target. Netrin-G1 is an atypical netrin, being diffusible rather than tethered to the cell membrane. NGL1 binds intracellularly with PSD-95, a postsynaptic scaffolding protein, and extracellularly with netrin-G1. This protein complex is thought to control the development of distinct populations of neuronal synapses and neural circuits [48]. Other LRRC4C variants are associated with developmental disorders such as autism [49]. Lrrc4c knockout mice display hyperactivity and anxiety-like behaviors [48,49].
We used Levene's median test to assess variance heterogeneity, since previous work has demonstrated this test maintains good control of the type-1 error rate for non-normally distributed phenotypes, of which avMSE and AOSW are examples [17,50]. One potential limitation of Levene's test is that it cannot incorporate covariates, however we addressed this issue by first regressing the phenotype on covariates and then applying Levene's test to the residuals (although we note that this approach may be biased for variants that are correlated with covariates [51]). Moreover, for any trait with a non-normal distribution, there will be a relationship between the mean and the variance of the trait [24]. Genetic variants associated with the variance of a trait purely as a consequence of such a general mean-variance relationship are unlikely to provide mechanistic insight into GxE interactions. Hence, several statistical tests designed to detect residual variance heterogeneity ('dispersion') not accounted for by a general mean-variance relationship have been developed [22][23][24]. Unfortunately, these tests can produce a high false-positive rate for non-normally distributed phenotypes [18] and it has been shown that trait transformation invariably leads to false positives in simulations where SNPs only affect the mean of the phenotype, regardless of the variance heterogeneity test used [18]. Together, these findings suggest there is no ideal method for detecting vQTL for non-normally distributed phenotypes. Nevertheless, in preliminary work [25] in which we applied the heteroskedastic linear model described by Young et al. [24], the 6 variants nearby TOX, GJD2, LAMA2, RBFOX1, KCNQ5 and LRRC4C demonstrated genome-wide significant dispersion effects, i.e. variance heterogeneity not accounted for by a general mean-variance relationship.
One route through which education may influence refractive error is increased near work and accommodation. Children typically show a small deficit ('lag') of accommodation, compared to the level required to focus precisely. The accommodative lag theory of myopia development proposes the hyperopic defocus on the retina accompanying accommodative lag acts as a physiological signal for increased axial elongation [52]. However, evidence for the accommodative lag theory is mixed [52][53][54][55] and other aspects of the near visual environment have been put forward to explain the link between near work and myopia [56][57][58]. Educational activities typically take place indoors. Since insufficient time outdoors is an established risk factor for myopia [9,59,60], exposure to education may also serve as a proxy for time spent indoors.
Strengths of the current study include the large sample size, the use of highly standardised methods for assessing phenotypes and risk factors, and the exclusion of gene-environment correlation effects. Limitations of the study were that autorefraction was not carried out on all UK Biobank participants, which reduced the sample size for the Stage-I sample, and that-in order to maintain high statistical power-attention was focused on a set of 25 variants enriched for myopia-associated GxE interaction effects. This approach would have excluded variants that did not have both a marginal effect and a variance heterogeneity association with refractive error, such as variants with 'cross-over' type GxE interaction effects in which the genetic effect cancels out under different environmental conditions (right panel of Fig 1). A further limitation was that the participants were adults 50-60 years of age from the UK. There has been a steady increase in years spent in education in many parts of the world over the past few decades. Therefore, studying gene-environment interactions in individuals who have more recently undergone their education or those in countries with more intensive education systems may be beneficial in detecting the full impact of GxE interactions in contemporary populations.
In summary, we identified 25 genetic variants with evidence of variance heterogeneity in their association with refractive error. Nineteen of the 25 variants also demonstrated evidence of variance heterogeneity for AOSW in an independent sample. These vQTLs are strong candidates for having either GxE or GxG interaction effects and, indeed, genetic variants located near the GJD2, LAMA2, RBFOX1, KCNQ5 and LRRC4C genes were associated with a progressively increasing risk of myopia as the number of years of schooling rose. Three of these geneeducation interaction findings were novel (those implicating LAMA2, KCNQ5 and LRRC4C), while the remaining two supported interactions identified previously in cohorts from East Asia. More research is needed to understand the biological pathways through which these 5 variants act, and how they interact with the effect of near work, intensive education, or insufficient time outdoors (for which education may be a proxy).

Ethics statement
The UK Biobank study had ethical approval from the United Kingdom National Health Service (NHS) Research Ethics Committee (Reference: 11/NW/0382). Signed and informed consent was obtained from all of the participants.
Full details of the methods are provided in S1 Text. Unless stated otherwise, all tests were performed in R [61].

Participants, phenotype and environmental variables
Analyses were performed on data from participants in UK Biobank, a prospective study examining the health and wellbeing of 500,000 adults aged 40-70 years-old living in the United Kingdom [62]. Baseline assessment visits occurred between 2006-2010. Information about education level and age of completing full-time education were obtained from a questionnaire. The binary variable UniEdu was used to indicate whether individuals had a University or college degree. The variable EduYears was used to classify the age at which the participants completed their full-time education [11]. Participants self-reported their age-of-onset of spectacle wear (AOSW) [63], which was coded as a continuous phenotype. An ophthalmic assessment was introduced into UK Biobank only towards the later stages of recruitment; approximately 23% of participants were assessed [64]. The refractive error phenotype (avMSE) was calculated as the average spherical equivalent of both eyes from non-cycloplegic autorefraction. DNA extraction, genotyping and imputation were performed as described [65]. A 'Stage-I' sample of unrelated participants of European ancestry (N = 88,334) with information for avMSE, Uni-Edu and EduYears was selected. Participants were classified as myopic if they had an avMSE � -0.50 D [66]. A Stage-II sample of European-ancestry participants (N = 252,838) was selected who were unrelated to each other, unrelated to any person in the Stage-I sample, and who had information available for AOSW, UniEdu and EduYears. (Under this classification scheme, participants who did not wear spectacles, and therefore did not report an AOSW, were not included in the Stage-II sample). Participants in the Stage-II sample were classified as myopic if they had an AOSW greater than 5 years and less than or equal to 25 years [67]. There were too few participants of non-European ancestry to study GxE interactions in other ancestry groups.

Two-step screening strategy for identifying putative GxE interaction variants
Step 1 was a standard GWAS for the phenotype avMSE in the Stage-I sample (Fig 1A), implemented with BOLT-LMM [68]. Sex, age, age-squared, a binary indicator of the genotype array (UK BiLEVE Axiom or UK Biobank Axiom array) and the first 10 ancestry principal components (PCs) were included as covariates. Approximately 7 million imputed genetic variants with a minor allele frequency >5% were tested. Independently associated SNPs were selected by clumping with PLINK [69]. SNPs associated with avMSE at the lenient threshold of P < 1 x 10 −4 were taken forward to Step 2, which was a variance heterogeneity analysis for the phenotype avMSE in the Stage-I sample (Fig 1B) using Levene's median test, implemented with OSCA [17] and following the approach recommended by Zhang et al. [19]. In total, 956 SNPs were taken forward to Step 2. A Bonferroni correction for multiple comparisons was applied (α = 0.05/956 = 5.23 × 10 −5 ). In the absence, worldwide, of a study cohort of comparable size to UK Biobank with data available for avMSE and high-density genotypes, confirmation of the variance heterogeneity findings in the Stage-I sample were sought by performing Levene's test for the trait AOSW in the Stage-II sample. Consideration of the validity of AOSW as a surrogate phenotype for refractive error when testing for gene-by-education interaction effects is discussed in Box C in S1 Text.

Gene-environment interaction tests
For each of the 25 SNPs identified by the 2-step screening protocol, linear regression models with an interaction term were fitted in the Stage-I sample (Eq 1) and the Stage-II sample (Eq 2): Where, avMSE and AOSW are n × 1 vectors of refractive error and age-of-onset of spectacle wear values in the n participants in the Stage-I sample and Stage-II sample, respectively. SNP is a n × 1 vector of genotype counts (0, 1 or 2), UniEdu is a n × 1 vector binary (0,1) variable for University level education, C is a n × k matrix of covariates (age, age-squared, genotyping array, and the first 10 ancestry PCs), γ is a 1 × k vector of regression coefficients, and ε and π are residuals assumed to be normally distributed. β 0 is an intercept, while β 1 , β 2 and β 3 are the regression coefficients for the main effect for the SNP, the main effect for UniEdu and the SNP × UniEdu interaction effect, respectively (likewise, for δ 0 , δ 1 , δ 2 and δ 3 ). Linear regression models of the same form, but with robust standard errors, were also fitted to test for genotypeby-EduYears interactions associated with either avMSE or AOSW.
Analogously, logistic regression models of the same form to Eq 1 and Eq 2 above were applied to test for genotype-by-education interaction effects associated with the outcome variable Myopic (1 = myopic, 0 = non-myopic). Simulations were used to confirm that the type-1 error rate of the genotype-by-education test for the AOSW phenotype was well-controlled under the test conditions, despite the non-normal distribution of AOSW and in the presence of GxE interactions not involving education (Box A in S1 Text).

Gene-gene interaction tests
SNP × SNP interaction tests were performed to examine if any pair of SNPs from amongst the 25 SNPs identified using the 2-step screening strategy had evidence of a genotype × genotype interaction. A linear regression model with an interaction term was fitted for each pair of variants in turn, as follows: Where terms are defined as above. The avMSE phenotype was tested in the Stage-I sample and the AOSW phenotype was tested in the Stage-II sample. A Bonferroni correction for multiple comparisons was applied to identify δ 3 or β 3 terms showing evidence of association, using an alpha value of α = 0.05/300 = 0.00017 (accounting for a total of 25×25 / 2 tests).

Gene-environment correlation tests
To test for gene-environment correlation, the following logistic regression model was fitted: Where, ω 0 is an intercept and ω 1 quantifies the association between the SNP genotype and having a University degree.

Sensitivity analyses
We carried out simulations to assess the type-1 error rate of Levene's median test for variance heterogeneity with the avMSE phenotype in the Stage-I sample and with the AOSW phenotype in the Stage-II sample, as well as the type-1 error rate of linear regression when testing for a SNP × UniEdu interaction with the AOSW phenotype in the Stage-II sample.
Supporting information S1 Table. Summary statistics for the 956 SNPs independently associated with refractive error in the GWAS for avMSE in the Stage-I sample.
(XLSX) S2 Table. Marginal effect genetic association tests and genotype-by-UniEdu and genotypeby-EduYears interaction tests for refractive error-related phenotypes (avMSE, AOSW, and Myopic) in the Stage-I sample and Stage-II sample.
(XLSX) S1 Text. Box A in S1 Text. Type-1 error rates. Box B in S1 Text. Validity of University education as an index of educational intensity. Box C in S1 Text. Validity of age-of-spectacles-wear (AOSW) as a surrogate for refractive error (avMSE) in GxE interaction tests. Box D in S1 Text. Comparison of results of sensitivity analyses. Box E in S1 Text. Supplementary Methods. Fig A in S1 Text. Manhattan plot of the results from the GWAS for refractive error (avMSE) in the Stage-I sample. Fig B in S1 Text. SNP genotype by University education (GxE) interactions associated with age-of-onset of spectacle wear (AOSW). Fig C in S1 Text. Summary of the evidence for SNP × SNP interactions contributing to myopia development. (PDF)